Fast Fisher Matrices and Lazy Likelihoods 
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Theoretical studies in gravitational wave astronomy often require the calculation of Fisher In- 
formation Matrices and Likelihood functions, which in a direct approach entail the costly step of 
computing gravitational waveforms. Here I describe an alternative technique that sidesteps the need 
to compute full waveforms, resulting in significant computational savings. I describe how related 
techniques can be used to speed up Bayesian inference applied to real gravitational wave data. 
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Over the past two decades there have been literally 
hundreds of papers written describing parameter estima- 
tion studies in gravitational wave astronomy (we have 
to something to pass the time while waiting for the first 
detection). See Refs. [ll-Q for some important early ex- 
amples. The set-up is as follows: a waveform family 
h+,hx describing the radiation produced by some po- 
tential source of gravitational waves is chosen, and the 
detector response function is specified. From these in- 
puts the waveform templates ft.(x) describing the signals 
produced in the detector for model parameters x can be 
computed. The output of the detector - or a network of 
detectors, it makes no difference - can then be written as 
s{t) = /i(x, i) -|- n{t), where n{t) is the instrument noise. 
For theoretical studies it is usually assumed that the noise 
is Gaussian with a colored spectrum S'„(/), making it 
advantageous to shift the analysis to the Fourier domain 
where the noise samples are uncorrelated. The question 
is then asked, how well can the parameters x be con- 
strained by the data? The answer follows from consider- 
ing how well waveforms h{y) with parameters y are able 
to fit the data. The goodness of fit is found by taking the 
squared difference between the model and data, scaled by 
the noise level: 



X^{y)^{-s-h{y)\.s-h{y)), 
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where the brackets denote the standard noise weighted 
inner product 
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The likelihood that the data would arise from a signal 
with parameters y is then [2| 



p(s|y) = Ce->^'(y)/" , 



(3) 



where C is a constant that does not depend on the signal. 
The posterior probability, p(y|s), is simply the product 
of the prior probability, p(y), and the likelihood, p{s\y), 
divided by an overall normalization factor (the evidence) . 
Notice that I never mentioned Mr. Wiener or matched 
filtering. Gravitational wave data analysis is no different 
than any other model fitting procedure: you compute a 
goodness of fit and turn the crank (but to confound as- 
tronomers in other fields we better keep on talking about 
matched filtering). Whatever you do, don't start refer- 
ring to the analysis as homodyne detection or the radio 



astronomers will descend on our field like a plague of lo- 
custs. 

The parameter recovery accuracy is computed by look- 
ing at contours of the posterior. For strong signals the 
central limit theorem says that the posterior distribution 
is well approximated by a multivariate Gaussian distri- 
bution: 
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where Tij is the Fisher information matrix and Ax* = 
y^ — x^ . It follows that the Fisher matrix is given by 
the expectation value of the negative Hessian of the log 
posterior density: 

V,j = -{d,d, lnp(x|s)) = (hjh^^) - d,d, Inp(x) . (5) 

For sufficiently large signal-to-noise ratios, the variance- 
covariance matrix C*^ = {Ax^Ax-') is given by the inverse 
of the Fisher information matrix. Better error estimates 
can be found by directly estimating the posterior distri- 
bution function using Markov Chain Monte Carlo 0, Q 
or Nested Sampling [3, U^ techniques. Either way, these 
parameter estimation studies require that we compute 
a large number of noise weighted inner products, {a\b), 
which would seem to imply that we need to compute a 
large number of waveforms. But that turns out not to be 
the case. 

Suppose that we want to map out the posterior using 
a Markov Chain or Nested Sampling. The first thing to 
realize is that you do not need to add simulated noise 
to the data [11|. This will only push the recovered re- 
covered parameters off from the injected parameters in 
a way that depends on the particular noise realization 
(later on I will explain how to generalize the approach to 
handle data with noise). What we are really interested 
in is the allowed spread in recovered parameters, and 
that is set by the noise level Sn{f) in the noise weighted 
inner product. So the quantity we need to compute is 
X^(y) = [h — h'\h — h') where I'm using the shorthand 
h ~ h{x.) and h' = h(y). Suppose for the moment that 
we happen to have stationary phase approximation wave- 
forms for h and h', we'll deal with time domain wave- 
forms in a moment. Writing h = A{f) exp(<I>(/)) and 
// = A'(/)cxp($'(/))weget 



x'(y) 



A^ + A'^ - 2AA' cos(A$) 



df, (6) 



where it is understood that aU of the quantities are fre- 
quency dependent. The first two terms in the integrand 
are always slowly varying function of frequency, and these 
terms can be integrated using a small number of func- 
tion calls. The oscillatory term that comes from {h\h') 
deserves more attention. So long as we are near maxi- 
mum likelihood, the phase difference A$ — $(/) — $'(/) 
will be a slowly vary function of /, as will cos(A$), and 
once again the integral can be computed to the desired 
accuracy with very few function calls. As we move away 
from maximum likelihood the phase difference grows and 
evaluating the integral takes more function calls. But we 
are not very interested in regions with low likelihood. 
It is simple to show that the variance of x^ is equal to 
the dimension D of the parameter space, and it follows 
that a Markov chain will rarely accept moves to places 
with x^ > 3-D. Even there the phase evolution is not 
large, and the likelihood calculation remains inexpensive. 
Of course, moves will be proposed to locations with low 
likelihoods, and these locations do lead to very rapid os- 
cillations in the {h\h') terms. Which is why we simply 
set {h\h') = whenever the phase change across the band 
exceeds some threshold (say tens of radians) . The results 
of a MCMC or Nested Sampling analysis done in this way 
is indistinguishable from what you get when computing 
the likelihood directly. 

The same technique can be used to compute the Fisher 
matrix if we write 
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with (no sum on i) 

Ahi = /i(x + e*ej) — h{x — e'e^) , 



(7) 



(8) 



Expanded out, the numerical central differences in equa- 
tion ^ lead to 2D^ + D inner products that have to be 
evaluated. Since the e* are by definition small, the in- 
tegrands are all slowly varying and can be computed to 
the necessary accuracy with a small number of function 
evaluations. The calculation is even simpler if we work 
directly with the derivatives of the amplitude and phase: 
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Again, all the quantities in the integrand are slowly vary- 
ing functions of frequency, and the integrand can be com- 
puted at little cost. This method of evaluating Fisher 
matrices has the added advantage that it is numerically 
far more stable than directly taking derivatives of the 
waveforms [12| . 

For the methods I have described to be useful the sig- 
nals must have a discrete collection of instantaneous fre- 
quency components. When this condition is met, it is 
usually possible to derive stationary phase approximation 
waveforms [13j . In some instances the orbital timescale 
of the detector is comparable in duration to the gravi- 
tational wave signal {e.g. most LISA, DECIGO or Ein- 
stein Telescope sources, signals from deformed neutron 



stars for LIGO/Virgo), and it is necessary to compute a 
stationary phase approximation to the detector response 
as well Q. These calculations are straightforward, and 
including finite arm length effects is only a minor com- 
plication. 

We can avoid the additional step of computing sta- 
tionary phase approximations to the signal by invoking 
Parseval's theorem and working directly in the time do- 
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(10) 



where the instantaneous frequency is given by the time 
derivative of the phase: f{t) — dt^{t)/{2TT). Here I 
am assuming that the signal can be written in the form 
h{t) = A(i)cos$(t), where A{t) is a slowly varying am- 
plitude (in other words, the same condition that is re- 
quired for the stationary phase approximation to work). 
Using the product formula: cos a cos /3 = (cos(a + f3) + 
cos(a — /3))/2, and dropping the rapidly oscillating term 
involving the sum of the phases, we have 



{h\h') 
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dt. 



(11) 



Again, for small phase differences, all the terms in the in- 
tegrand are slowly varying, and the integral can be com- 
puted with a small number of function calls. Fisher ma- 
trix elements can be computed using the the time domain 
analog of equation © . 

In many cases the gravitational wave signals have 
power spread over multiple harmonics. For a Fisher ma- 
trix analysis it is enough to simply add together the con- 
tributions from each harmonic, but for likelihood calcu- 
lations there is t he p ossibility that different harmonics 
might overlap [ij, [la| . To account for this possibility we 
need to compute 
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where the i, j label the harmonics. Only those terms with 
small phase differences need to be computed. 

The techniques described here are applicable to an- 
alytic or semi-analytic waveform models such as those 
derived using the post-Newtonian [l^ or Effective One 
Body [l7| approaches. These waveforms may require the 
numerical integration of energy fiuxes and/or spin preces- 
sion equations. An effective strategy for handling these 
cases is to include the calculation of the noise weighted 
inner products as differential equations, d{a\b)/dt, and to 
solve the couple set of differential equations using a high 
order adaptive solver. This approach works very well for 
spinning systems since the precession timescale sets the 
amplitude evolution timescale. Because the orbital decay 
and spin precession timescales are generally much slower 
than the orbital timescale, there are substantial savings 
to be had from computing the likelihood and Fisher ma- 
trix elements in this way. If the evolution equations hap- 
pen to be "stiff" , and require small step sizes, it may 



be advantageous to compute the derivatives directly. For 
example, rather than integrating the equation 



da{x,t) 
di 



a(x,t). 



(13) 



for different values of x to estimate the derivatives a^i, 
it is usually faster and more accurate to integrate the 
equations 



da,i(x,t) 
di 



,(x,t). 



(14) 



There are many applications for these techniques out- 
side of theoretical parameter estimation studies. While 
the techniques described (so far) are not directly appli- 
cable when instrument noise is present, they can play 
a role in setting up template grids or driving stochas- 
tic searches. For example, in MCMC style searches it 
is important to have well chosen proposal distributions. 
The best are those that closely approximate the posterior 
distribution. The Fisher Information matrix provides a 
local approximation to the posterior, and multi-variate 
normal distributions of the form ffl have proven useful 
as proposal distributions [Tj, [T5|, \1^. However, if the 
Fisher matrix computation uses the full waveforms, it 
costs D{D + l)/2 times more to compute than the likeli- 
hood, and at that point you are better off using a cheaper 
proposal distribution and taking more steps in the chain. 
On the other hand, if the Fisher matrix elements can be 
computed at a fraction of the cost of the full waveforms, 
it makes sense to use (|1]) as a proposal distribution. Now 
suppose that the search has found a mode of the poste- 
rior at X, which may or may not be the primary mode. 
To fully explore all the modes it helps to have a crude 
map of the global structure of the posterior. We can gen- 
erate such a map at moderate computational cost. The 
procedure is to run a new MCMC search using the noise- 
free chi-squared x^ (y) = {h{x)~h{y)\h{x.) — h{y)), which 
can be computed for a fraction of the cost of the true chi- 
squared. If /i(x) happens to correspond to a secondary 
mode of the true posterior, one of the secondary modes 
of the approximate posterior will correspond to the pri- 
mary mode of the true posterior. Thus, once the original 
search finds any hint of the signal, be it a secondary or 
tertiary mode, we can develop a map that points us to- 
wards the primary mode. Because noise will push the 
true posterior distribution away from the noise-free ap- 
proximation, it is a good idea to temper the approximate 
posterior: x^(y) ~^ X^(y)/^ using some "temperature" 
T > 1. Increasing the temperature flattens the distribu- 
tion, and we have to find a balance between making the 
approximate map uninformative (T — > oo), and being 
overly prescriptive (T — >■ 1). Temperatures in the range 
T e [3,5] have been found to work quite well. 

The same techniques used to speed up the likelihood 
and Fisher matrix calculations can also be used to speed 
up the computation of the template metrics [l9| gij for a 
grid based search. This should prove especially useful in 



high dimensional spaces where random template place- 
ment algorithms are required [20|. In particular, fast 
metric calculations are needed for hybrid grid-MCMC 
searches |15j where the physical priors are replaced by 
p(x) ~ y/g{x). This hybrid search is equivalent to a 
random grid search when no signals are present, which 
ensures full coverage of the search space, and has the ad- 
vantage of being significantly faster than a grid search 
when signals are present (the saving grows with the sig- 
nal strength). 

In essence, the time saving techniques that I have de- 
scribed all amount to heterodyning of the data. That is, 
if we have signals /i(x) and h{y) that differ by a small 
phase difference, their product yields a low frequency 
beat signal plus a high frequency signal that can be dis- 
carded without loss of information. The frequencies of 
the signals do not have to be constant for heterodyning 
to work. Heterodyning has been used in the search for 



gravitational wave signals from known radio pulsars [21 



and to simulate LISA observations of white dwarf bina- 
ries [22. What apparently has not been realized before 
is that heterodyning can be used to significantly speed 
up MCMC and Nested Sampling explorations of the pos- 
terior for signals embedded in noisey instrument data. 
Suppose that the primary or a secondary mode of the 
posterior x has been located by some search algorithm 
and you would now like to fully map out the posterior 
distribution. Rather than work with the full signal s{t), 
first Fourier transform the data, whiten using the noise 
spectral density S'„(/), and heterodyne using the carrier 
phase $(x, /). The high frequency components of the 
data can now be thrown away, and the noise weighted 
inner products can be computed using templates that 
are heterodyned against the carrier phase. The band- 
width of the heterodyned signal that needs to be kept 
depends on the details of the analysis, but the data vol- 
ume will typically be reduced by many orders of mag- 
nitude. To gain the full benefit from this approach the 
heterodyned templates must be computed directly at low 
cadence using the phase difference A<I>(/) (or the equiv- 
alent in the time domain). Note that carrier phase that 
beats with the signal at the primary maximum will also 
beat with the signal at the secondary maxima. The likeli- 
hoods computed far from the maxima will not agree with 
those computed using the full signal, but if the hetero- 
dyned signal is given sufficient bandwidth, the differences 
will be small in regions with noticable posterior weight. 
As a concrete example, consider a binary neutron star 
inspiral that might be observed by the LIGO/Virgo de- 
tectors. The gravitational wave signal in each detector 
can be written as 

h{t) = F+h+{t)+F''h^{t) 

= i^+A+(t)cos$(t)+F^Ax(t)sin$(t). (15) 

The antenna patterns F+, F^ depend on the sky loca- 
tion and polarization angle. The amplitudes A^{t) and 
Ax (t) depend on the merger time, the orbital inclina- 
tion, the masses and the distance to the binary. The 




FIG. 1: Heterodyned and whitened neutron star binary 

inspiral signals and instrument noise. The upper and 

lower panels show the sine and cosine components, 

respectively. The solid (red) line is the instrument 

noise, the dashed (blue) line is the injected signal and 

the dotted (black) line is for a template generated with 

the individual masses changed by —3% and +1%. 



phase <f>(t) depends on the sky location (which deter- 
mines the time delay between the signals seen at the dif- 
ferent sites), merger time and the masses. Here I am 
using 2-PN waveforms without spin. The heterodyning 
can be done in the time domain by multiplying the signal 
by cos0(i) and sin9(i), where Q{t) is the reference, or 
carrier phase corresponding to the best fit source parame- 
ters. For a f .4Mo-f .3Af0 inspiral, the original waveforms 
reached a maximum frequency of 1628.5 Hz. The signal 
was whitened and heterodyned, then lowpass filtered to 
a maximum of 10 Hz. Low cadence templates for the sine 



and cosine streams can then be generated: 



1 



hcosit) = -(F+A+(i)cosA$(i)+FMx(i)sinA$(t)) 
hsinit) = i(-^+A+(t)sinA$(i)+fMx(t)cosA$(i)) 



(16) 



where A4>(i) = $(i) -8(i). Figure [T] shows the whitened 
and heterodyned time domain signal and noise for a neu- 
tron star binary with network SNR — 15. We used the 
injected source parameters to generate the carrier phase 
$(t). Also shown is a template generated using (fT6|) with 
the masses shifted by —3% and -1-1%. The noise weighted 
inner products are computed directly in the time domain: 



{h\s) 



hsin{t)Ssin{t) + hcosit)Scos{t) 



Snifit)) 



dt. 



(17) 



For the case shown. 



("■inji'iinj 



(/l|/linj) = 27.9, , (/linjlTl) 



) = 225.0, {h\h) = 231.2, 
= -14.1, and [h\n) = 12.5. 



The inner products computed using the full cadence data 
agree with those computed using the heterodyned data to 
~ ±0.01. The only difference is that the heterodyned in- 
ner products can be computed one thousand times faster. 
Hopefully the techniques I have described will help re- 
duce the high computational cost of Bayesian approaches 
to gravitational wave data analysis, and allow their wider 
adoption. 
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